Skip to main content

All Questions

1vote
0answers
19views

How to correctly use RFECV for feature selection in a Scikit-Learn pipeline with a Simple Decision Tree?

I am working on the Kaggle House Price Prediction competition and have built a Scikit-Learn pipeline that includes: Preprocessing (handling missing values, scaling, encoding) Feature Engineering ...
Jake Ferris's user avatar
1vote
1answer
207views

How does a Decision Tree split when two features are tied?

Decision Trees split based on which feature and which cut-off value creates the largest mean decrease in impurity (assuming hyperparameter split="best", criterion="gini"). Now take ...
AvanishM's user avatar
1vote
0answers
59views

sklearn - OneHotEncoding and SelectPercintile

in sklearn example there is a code ...
Maciej778's user avatar
1vote
1answer
175views

integration of Feature Selection in Pipeline

I have noticed integrating feature selection in a pipeline alters results. Pipeline 1 gives slightly different results with pipeline 2. Why should this be so? Pipeline 2 ...
wwnde's user avatar
1vote
0answers
73views

How recursive feature elimination with cross validation internally works?

I am trying to understand how recursive feature elimination with cross validation works (the RFECV on sklearn). Lets say that we have 10 features, and we perform <...
Antonios Sarikas's user avatar
1vote
1answer
853views

Does sklearn perform feature selection within cross validation?

I would like to add a feature selector on my pipeline and use gridsearchcv to tune both the hyperparameters of the selector and the classifier(s). I am wondering if sklearn performs feature selection ...
Antonios Sarikas's user avatar
0votes
1answer
619views

How does SelectFromModel from scikit-learn select features?

When I use XGBClassifier with SelectFromModel the algorithm always returns around five features regardless of the ...
N_Z's user avatar
  • 193
0votes
1answer
87views

Encoding Categorical feature with high cardinality - in my case IP adresses

I'm working on an intrusion detection project, I have many categorical features, for some I used label encoding since I don't have many possible values. But for IP addresses, it's a high cardinality ...
biihu's user avatar
2votes
0answers
81views

Feature selection and model performance

Featuretools provides an automated way to generate features from your data, by providing relationships within your data and applying their so-called deep feature synthesis. It generates features like ...
holzben's user avatar
1vote
0answers
38views

How to return selected features with different feature selection models?

I use the below function to detect the effect of those feature selection models on my data, it works perfectly. what I want is to return the name of selected features for each model, is there any ...
N_Z's user avatar
  • 193
2votes
1answer
192views

What are the differences between the below feature selection methods?

Do the below codes do the same? If not, what are the differences? ...
N_Z's user avatar
  • 193
1vote
0answers
165views

Using F_regression to find the best significant features

We are trying to use SelectKBest F_Regression scoring function on a pool of 1000 numerical features, and solve a regression problem. Also, we wanted to paralellize the execution of SelectKBest and we ...
Atul Mishra's user avatar
2votes
1answer
2kviews

How to deal with date features in linear regression?

I need some help about a project. I have a dataframe like that; YEAR MONTH INDICATOR_1 INDICATOR_2 INDICATOR_3 2014 3 0.123 0.495 0.222 My goal is to predict all of the indicator for the next year (...
Alan CUZON's user avatar
0votes
1answer
308views

How do I fine-tune model performance after the initial run? (Scikit-Learn)

I've just started learning regression using scikit-learn and stumbled upon a problem. For a given dataset, let's say that I've imputed the missing data and one-hot encoded all categorical features. ...
Garreth Lee's user avatar
0votes
1answer
100views

Correlation with target variable for regression problem

Given the following dataframe age job salary 0 1 Doctor 100 1 2 Engineer 200 2 3 Lawyer 300 ... with ...
william007's user avatar

153050per page
close